Search CORE

3 research outputs found

FPGA-based stereo vision system for autonomous driving

Author: Fornt Mas Jordi
Publication venue: Universitat Politècnica de Catalunya
Publication date: 21/05/2019
Field of study

The project consists on the design and implementation of a real-time stereo vision image sensor oriented to autonomous driving systems using an FPGA. The function of this sensor is to output a real-time depth image from an input of two grayscale luminance images, which can make further processing much easier and faster. The final objective of the project is to develop a standalone prototype for the implementation of the system on an autonomous vehicle, but it will be developed on an existing FPGA platform to prove its viability. Two low-cost digital cameras will be used as input sensors, and the output image will be transmitted to a PC

UPCommons. Portal del coneixement obert de la UPC

An energy-efficient GeMM-based convolution accelerator with on-the-fly im2col

Author: Abella Ferrer Jaume
Altet Sanahujes Josep
Caro Roca Martí
Fontova Muste Pau
Fornt Mas Jordi
Moll Echeto Francisco de Borja
Studer Christoph
Publication venue
Publication date: 27/06/2023
Field of study

Systolic array architectures have recently emerged as successful accelerators for deep convolutional neural network (CNN) inference. Such architectures can be used to efficiently execute general matrix–matrix multiplications (GeMMs), but computing convolutions with this primitive involves transforming the 3-D input tensor into an equivalent matrix, which can lead to an inflation of the input data, increasing the off-chip memory traffic which is critical for energy efficiency. In this work, we propose a GeMM-based systolic array accelerator that uses a novel data feeder architecture to perform on-chip, on-the-fly convolution lowering (also known as im2col), supporting arbitrary tensor and kernel sizes as well as strided and dilated (or atrous) convolutions. By using our data feeder, we reduce memory transactions and required bandwidth on state-of-the-art CNNs by a factor of two, while only adding an area and power overhead of 4% and 7%, respectively. Application specific integrated circuit (ASIC) implementation of our accelerator in 22-nm technology fits in less than 1.1 mm 2 and reaches an energy efficiency of 1.10 TFLOP/sW with 16-bit floating-point arithmetic.This work was supported in part by the MCIN/AEI/10.13039/501100011033 under Project PCI2020-134984-2, in part by the European Union NextGenerationEU/PRTR, in part by the European Union’s Horizon Europe Program under Project Key Digital Technologies (KDT) Joint Undertaking (JU) under Grant 101097224, and in part by the Spanish Ministry of Science and Innovation through MCIN/AEI/10.13039/501100011033 under Grant PID2019-107255GB-C21.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Two examples of approximate arithmetic to reduce hardware complexity and power consumption

Author: Altet Sanahujes Josep
Calomarde Palomino Antonio
Etxezarreta Imanol
Fontova Pau
Fornt Mas Jordi
Jin Leixin
Moll Echeto Francisco de Borja
Morancho Llena Enrique
Rubio Sola Jose Antonio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

© 2022 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.As the end of Moore's Law approaches, electronic system designers must find ways to keep up with the ever increasing computational demands of the modern era. Some computationally intensive applications, such as multimedia processing, computer vision and artificial intelligence, present a unique feature that makes them especially suitable for hardware-level optimizations: their inherent robustness to noise and errors. This allows circuit designers to relax the constraint that arithmetic operations, such as multiplications and additions, must be completely accurate. Instead, approximations can be used in the arithmetic units, enabling system-level reductions in hardware area and power consumption, as well as improvements in performance, while hardly affecting the output of the final application. In this work, we explore two approximate arithmetic techniques. First, we consider approximations at the circuit design level by implementing several approximate multiplier units and evaluating their accuracy when used in executing YOLOv3, a state-of-the-art camera-based object detection deep neural network. Second, we apply the technique of overscaling to induce approximations in adder circuits by aggressively undervoltaging and overclocking them, and we compare the behavior of exact and approximate adders under these conditions. We find that, on one hand, some approximate multipliers are able to execute the YOLO network with almost no effect on the results, and on the other, approximate adder circuits are much more resilient to overscaling techniques than exact adders.This work was partially supported by Spanish MCIN/AEI/10.13039/501100011033, Project PID2019-103869RB-C33.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC